90 research outputs found

    A new weighted NMF algorithm for missing data interpolation and its application to speech enhancement

    Get PDF
    In this paper we present a novel weighted NMF (WNMF) algorithm for interpolating missing data. The proposed approach has a computational cost equivalent to that of standard NMF and, additionally, has the flexibility to control the degree of interpolation in the missing data regions. Existing WNMF methods do not offer this capability and, thereby, tend to overestimate the values in the masked regions. By constraining the estimates of the missing-data regions, the proposed approach allows for a better trade-off in the interpolation. We further demonstrate the applicability of WNMF and missing data estimation to the problem of speech enhancement. In this preliminary work, we consider the improvement obtainable by applying the proposed method to ideal binary mask-based gain functions. The instrumental quality metrics (PESQ and SNR) clearly indicate the added benefit of the missing data interpolation, compared to the output of the ideal binary mask. This preliminary work opens up novel possibilities not only in the field of speech enhancement but also, more generally, in the field of missing data interpolation using NMF

    Least squares DOA estimation with an informed phase unwrapping and full bandwidth robustness

    Get PDF
    The weighted least-squares (WLS) direction-of-arrival estimator that minimizes an error based on interchannel phase differences is both computationally simple and flexible. However, the approach has several limitations, including an inability to cope with spatial aliasing and a sensitivity to phase wrapping. The recently proposed phase wrapping robust (PWR)-WLS estimator addresses the latter of these issues, but requires solving a nonconvex optimization problem. In this contribution, we focus on both of the described shortcomings. First, a conceptually simpler alternative to PWR is presented that performs comparably given a good initial estimate. This newly proposed method relies on an unwrapping of the phase differences vector. Secondly, it is demonstrated that all microphone pairs can be utilized at all frequencies with both estimators. When incorporating information from other frequency bins, this permits a localization above the spatial aliasing frequency of the array. Experimental results show that a considerable performance improvement is possible, particularly for arrays with a large microphone spacing

    Spectral refinement with adaptive window-size selection for voicing detection and fundamental frequency estimation

    Get PDF
    Spectral refinement (SR) offers a computationally in-expensive means of generating a refined (higher resolution) signal spectrum by linearly combining the spectra of shorter, contiguous signal segments. The benefit of this method has previously been demonstrated on the problem of fundamental frequency (F0) estimation in speech processing – specifically for the improved estimation of very low F0. One drawback of SR is, however, the poorer detection of voicing onsets due to the Heisenberg-Gabor limit on time and frequency resolution. This may also lead to degraded performance in noisy conditions. Transitioning between long- and short-time windows for the spectral analysis may offer a good trade-off in these situations. This contribution presents a method to adaptively switch between short- and long-time windows (and, correspondingly, between the short-term and the refined spectrum) for voicing detection and F0 estimation. The improvements in voicing detection and F0 estimation due to this adaptive switching is conclusively demonstrated on audio signals in clean and corrupted conditions

    Robust period estimation of automated cutting systems by improved autocorrelation & linear regression techniques

    Get PDF
    Condition monitoring is an important asset in the industry to improve the safety and efficiency of the production chain. However, in heavy machinery – such as edge trimmers in steel mills – it is often impractical and unsafe to install intrusive sensors to get the data needed for condition monitoring. Non-intrusive monitoring techniques based, e.g., on acoustic data captured by microphones placed in the vicinity of the assembly being monitored are attractive options. Our application deals with the acoustic monitoring of rotational blades cutting steel strips at high speeds. Knowing the correct period of the cutting process is important for quality evaluation purposes. We propose two novel robust methods to estimate the periodicity based on the audio captured by a microphone near the blades. One is an improved autocorrelation function and the other is based on linear regression, both using incorporating an novel test for the correctness of the estimated period. We compare our methods against the standard autocorrelation-based periodicity measurement techniques on real data recordings. The proposed method estimates the correct period about 87% of the time, compared to an accuracy of only 51% using standard periodicity measurement approaches

    Exploiting temporal context in CNN based multisource DOA estimation

    Get PDF
    Supervised learning methods are a powerful tool for direction of arrival (DOA) estimation because they can cope with adverse conditions where simplified models fail. In this work, we consider a previously proposed convolutional neural network (CNN) approach that estimates the DOAs for multiple sources from the phase spectra of the microphones. For speech, specifically, the approach was shown to work well even when trained entirely on synthetically generated data. However, as each frame is processed separately, temporal context cannot be taken into account. This prevents the exploitation of interframe signal correlations, and the fact that DOAs do not change arbitrarily over time. We therefore consider two different extensions of the CNN: the integration of a long short-term memory (LSTM) layer, or of a temporal convolutional network (TCN). In order to accommodate the incorporation of temporal context, the training data generation framework needs to be adjusted. To obtain an easily parameterizable model, we propose to employ Markov chains to realize a gradual evolution of the source activity at different times, frequencies, and directions, throughout a training sequence. A thorough evaluation demonstrates that the proposed configuration for generating training data is suitable for the tasks of single-, and multi-talker localization. In particular, we note that with temporal context, it is important to use speech, or realistic signals in general, for the sources. Experiments with recorded impulse responses and noise reveal that the CNN with the LSTM extension outperforms all other considered approaches, including the plain CNN, and the TCN extension

    A robust sequential hypothesis testing method for brake squeal localisation

    Get PDF
    This contribution deals with the in situ detection and localisation of brake squeal in an automobile. As brake squeal is emitted from regions known a priori, i.e., near the wheels, the localisation is treated as a hypothesis testing problem. Distributed microphone arrays, situated under the automobile, are used to capture the directional properties of the sound field generated by a squealing brake. The spatial characteristics of the sampled sound field is then used to formulate the hypothesis tests. However, in contrast to standard hypothesis testing approaches of this kind, the propagation environment is complex and time-varying. Coupled with inaccuracies in the knowledge of the sensor and source positions as well as sensor gain mismatches, modelling the sound field is difficult and standard approaches fail in this case. A previously proposed approach implicitly tried to account for such incomplete system knowledge and was based on ad hoc likelihood formulations. The current paper builds upon this approach and proposes a second approach, based on more solid theoretical foundations, that can systematically account for the model uncertainties. Results from tests in a real setting show that the proposed approach is more consistent than the prior state-of-the-art. In both approaches, the tasks of detection and localisation are decoupled for complexity reasons. The localisation (hypothesis testing) is subject to a prior detection of brake squeal and identification of the squeal frequencies. The approaches used for the detection and identification of squeal frequencies are also presented. The paper, further, briefly addresses some practical issues related to array design and placement. (C) 2019 Author(s)
    • …
    corecore